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Summary. Principal stratification is a causal framework to analyze randomized experiments with a 
post-treatment variable between the treatment and endpoint variables. Because the principal strata 
defined by the potential outcomes of the post-treatment variable are not observable, we generally can¬ 
not identify the causal effects within principal strata. Motivated by a real data set of phase III adjuvant 
colon clinical trials, we propose approaches to identifying and estimating the principal causal effects via 
multiple trials. For the identifiability, we remove the commonly-used exclusion restriction assumption by 
stipulating that the principal causal effects are homogeneous across these trials. To remove another 
commonly-used monotonicity assumption, we give a necessary condition for the local identifiability, 
which requires at least three trials. Applying our approaches to the data from adjuvant colon clinical 
trials, we find that the commonly-used monotonicity assumption is untenable, and disease-free survival 
with three-year follow-up is a valid surrogate endpoint for overall survival with five-year follow-up, which 
satisfies both the causal necessity and the causal sufficiency. We also propose a sensitivity analysis 
approach based on Bayesian hierarchical models to investigate the impact of the deviation from the 
homogeneity assumption. 

Keywords: Causal inference; Causal necessity; Causal sufficiency; Clinical trial; Criterion for 
surrogate endpoints 


1. Introduction 


1.1. Principal stratification and surrogate endpoints 

Causal effects are defined as comparisons between potential outcomes of the endpoint variable 
under treatment and control for the same group of individuals. To evaluate the causal effects on the 
endpoint within subpopulations stratified by a post-treatment variable, Frangakis and Rubin (2002) 
proposed the principal stratification framework in which the subpopulations are defined by the joint 
values of the potential outcomes instead of the observed value of the post-treatment variable. The 
joint potential outcomes of the post-treatment variable can be viewed as a pretreatment covariate 
vector unaffected by the treatment. Principal stratification has several applications in the current 
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literature. The compliance behavior defined by the potential treatment acceptances is used to 


address the noncompliance problem (Angrist et al. 1996), the potential survival status is used to 


evaluate the effect on the quality of life with truncation by death (Rubin 2006 Ding et al. 2011 


Yang and Small 2015), and the potential employment status is used to deal with the truncation of 


wages due to unemployment in the evaluation of job-training programs (Zhang and Rubin 2003 


Zhang et al. , 

2009 

Frumento et al. 

2012 

). The potential response indicators are used to address 

non-ignorable nonresponse problems 

(Frangakis and Rubin 1999, Mealli and Pacini 

OO 

o 

o 

CM 

Mattei 


et al. 2014), and the potential intermediate variables are used to define direct and indirect effects 


Rubin 

o 

o 

CM 

Gallop et al. 

2009 


Mattei and Mealli 2011). 


Another important application of principal stratification is to evaluate surrogate endpoints in 
clinical trials (Frangakis and Rubin, 2002). When direct measurement of an endpoint of interest 


is too time-consuming or costly, we try to measure a surrogate for the endpoint. There have been 


some criteria for judging surrogates from different points of view. Prentice (1989) first proposed the 


statistical surrogate criterion, which requires conditional independence of the treatment and the 
observed endpoint given the observed surrogate. From a causal perspective, [Frangakis and Rubin| 
(2002) proposed the principal surrogate criterion, and pointed out that the statistical surrogate does 


not satisfy the causal necessity, i.e., no difference between the potential outcomes of the surrogate 


implies no difference between the potential outcomes of the endpoint. Lauritzen (20041 proposed 


the strong surrogate criterion depicted by a causal diagram requiring the surrogate break the causal 


path from the treatment to the endpoint, which is stronger than the principal surrogate. Gilbert and 


Hudgens (2008) argued that a principal surrogate should satisfy not only causal necessity but also 


causal sufficiency. The causal sufficiency requires that if the treatment effect on the surrogate is non¬ 


zero, then the treatment effect on the endpoint is also non-zero. Joffe and Greene (2009) considered 


four different approaches for evaluating surrogate endpoints. However, all of these approaches may 
suffer from the surrogate paradox as pointed out by Chen et al. (2007), Ju and Geng ( 2010| ) and 
VanderWeele (2013). That is, for a surrogate satisfying any of these criteria, it is possible that the 


treatment has a negative average causal effect on the endpoint even if the treatment has a positive 
average causal effect on the surrogate and the surrogate has a positive average causal effect on the 
endpoint. 


For evaluating surrogates, Mealli and Mattei (20121 suggested conducting a principal stratifica¬ 


tion analysis investigating the effect of the treatment within all principal strata. However, because 
we cannot simultaneously observe two potential outcomes of the post-treatment variable, we do 
not know the principal stratum of an individual, and generally we cannot identify causal effects 
within principal strata. Zhang and Rubin (2003) and Cheng and Small (20061 proposed large sam¬ 


ple bounds of causal effects within some principal strata, which, however, may be too wide to be 
informative. Angrist et al. (1996) discussed the identifiability of the complier average causal effect 
under the monotonicity and exclusion restriction assumptions. Without the exclusion restriction 


assumption, Zhang et al. (2009) used Gaussian mixture models to identify causal effects within 


principal strata. Gilbert and Hudgens (2008) and Huang and Gilbert (20111 proposed approaches 


to evaluating surrogates based on principal stratification in a single trial, but they assumed constant 
potential outcomes of the surrogate under control. Zigler and Belin (2012) proposed a Bayesian 


approach to estimating the causal effects on the endpoint within principal strata without the mono¬ 
tonicity assumption, but their approach relied on prior distributions on some parameters that are 
not identifiable. 
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1.2. Motivation 


Our study is motivated by the data from phase III adjuvant colon clinical trials (ACCTs). The 
goal of the ACCTs is to test whether disease-free survival (DFS) with three-year follow-up can be 
used as a surrogate for the overall survival (OS) with five-year follow-up ( Sargent et al.| 2005). The 
data are collected from 10 clinical trials. Sargent et al. (2005) found a strong correlation between 
the hazard ratio of the treatment on DFS and the hazard ratio of the treatment on OS. Applying a 
meta-analysis, Baker et al. (2012) used DFS as a principal surrogate to predict the treatment effect 
on the endpoint using the treatment effect on the surrogate. They assumed that the true endpoint 
does not depend on the treatment among subjects whose surrogate is the same under treatment 
and control, i.e., the causal necessity holds. However, the causal necessity used by Baker et al. 


(2012) has not been verified by either previous studies or observed data. 

For the ACCTs data, the monotonicity assumption requiring nonnegative individual causal 
effect on the surrogate may not hold, because the treatment may have negative side-effects on 
the surrogate for some patients. Second, the exclusion restriction assumption implies the causal 
necessity, which is the scientific question of interest in evaluating surrogate endpoints. Third, 
Gaussian mixture models are not applicable to the binary endpoints in the ACCTs data. The meta¬ 


analysis approaches (Daniels and Hughes 


from the surrogate paradox (VanderWeele 
assumption does not hold 


1997 


Li et al. 2011 Baker et al. 20121 may also suffer 


2013). When the monotonicity or exclusion restriction 


Ding et al. ( 

2011 

, Mealli and Pacini 

(20131, and 

Yang and Small 


(2015) achieved partial and point identification of principal causal effects by exploiting covariates or 
secondary outcomes, and Mattei et al. ( |2013 ) improved Bayesian inference for principal causal effects 
using multiple outcomes. However, when additional outcomes or covariates are not available, we 
cannot identify causal effects using previous approaches. Without identifiability, Bayesian inference 
is sensitive to prior distributions (Gustafson 2009). 


In this paper, we propose approaches to identifying the principal stratification causal effects 
without the exclusion restriction assumption and further without the monotonicity assumption. To 
remove the exclusion restriction assumption, we need at least two trials. Furthermore, to remove 
the monotonicity assumption, we give a sufficient condition for the local identifiability of the causal 
effects, which requires at least three trials. We first assume that these trials are homogeneous. Using 
the identified principal stratification causal effects, we can test the causal necessity and the causal 
sufficiency of a surrogate. We evaluate the treatment effect on the unobserved endpoint in a new 
trial where the distribution of principal strata and/or the treatment may be different from those 
in the validation trials. We then propose criteria for surrogates that avoid the surrogate paradox. 
Allowing for possible deviations from the homogeneity assumption, we then conduct a sensitivity 
analysis based on a class of Bayesian hierarchical models. We apply the proposed approaches to 
the ACCTs data and evaluate the surrogacy of DFS with three-year follow-up for OS with five-year 
follow-up. 


The paper proceeds as follows. We introduce the notation and assumptions in Section [2] We 
present the identification conditions with and without the monotonicity assumption in Section [3] 
Section [4] discusses the evaluation of surrogate based on principal stratification. We evaluate the 
performance of our approaches with finite sample sizes in Section [5] via simulation studies. We 
apply our approaches to the ACCTs data in Section [fij and perform a sensitivity analysis using 
Bayesian hierarchical models in Section [7J We conclude with a discussion in Section [8] We present 
the details of the proofs and computations in the online supporting materials, and provide the data 
and compute code online. 













































2. Notation and assumptions 


2.1. Potential outcomes and principal stratification 

Suppose that there are Nr independent trials which have the same treatment and the same control. 
Let R denote the trial number taking values from 1 to Nr. Let Z be a binary treatment assignment 
with Z = 1 for treatment and 0 for control. Let S denote a binary surrogate and Y a binary endpoint 
of interest. In the ACCTs data, Si = 0 if the cancer of patient i reoccurred before 3 years, and Si = 1 
otherwise. The endpoint Y z denotes the survival status within 5 years, with Yi = 1 for “survival” 
and Yi = 0 for “dead.” We use the potential outcomes framework to define causal effects and make 
the Stable Unit Treatment Value Assumption (SUTVA) throughout the paper, i.e., there is only 


one version of the potential outcomes and there is no interference between units (Rubin 1980). The 


SUTVA allows us to uniquely define the potential outcomes Si(z) and Yi(z) for the surrogate and 
the endpoint variables if patient i were to receive treatment 2 for 2 = 0 and 1. The observed values of 
S and Y are deterministic functions of both the treatment assignment and the potential outcomes, 
i.e., Si = Si(Zi) = ZiSi{ 1) + (1 - Zi)Si{ 0) and Y = Y(Zi) = Z,Y( 1) + (1 - Z X )Y X ( 0). Throughout 
our paper, we assume that {(Ri, Zi , S'j(l), <S)(0), V(l), V(0)) :i = 1,..., N} are independently and 
identically distributed (iid) samples drawn from an infinite super-population, and thus the observed 
{(Ri, Z i: Si, Yi) : i = 1 are also iid. Although we will discuss only the binary endpoints 

in depth, our framework and approaches are also applicable to general continuous endpoints by 
dichotomizing the endpoints to identify the distributional causal effects. 


Frangakis and Rubin (2002) defined principal stratification using the joint potential outcomes 
of the surrogate under both treatment and control, i.e., Ui = (iS)(l), Si(0)). For simplicity, we 
relabel the possible values of U, (1,1), (1,0), (0,1) and (0,0), as ss,ss,ss and ss, respectively. In 
our motivation example, we use “s” for “disease-free survival in three-year follow-up,” and “s” 
otherwise. We are interested in the principal stratification average causal effects (PSACEs) for 
each trial, defined as 

ACE ur = E{Y( 1) - V(0) | U = u,R = r} 

for u = ss, ss, ss, as, and r = 1, ..., Nr. Based on the PSACEs, we can formally define the causal 
necessity and causal sufficiency (Gilbert and Hudgens 2008). 

Definition 1. Causal necessity requires ACE ur = 0 for u = ss,ss, and causal sufficiency 
requires ACE ur 0 for u = ss, ss, where r = 1,..., Nr. 

The above definition of causal necessity is weaker than the usual exclusion restriction assumption 


(Angrist et al. 19961 that requires zero individual causal effect on the outcome for principal strata 


u = ss and ss. 


2.2. Assumptions 

In this subsection, we introduce the basic assumptions, which are commonly used in causal inference. 
By identification of parameters, we mean that they can be expressed as functions of the distributions 
of observed variables. We let A\\_B \ C denote the conditional independence of A and B given C. 

Assumption 1 . (Randomization). ZlL{S , (l),5'(0), V(l), V(0)} | R. 

Assumption [l] means that treatment assignment Z is randomized within each trial, but the 
assignment probabilities may be different. Because we have independent randomized clinical trials 
in the ACCTs data, Assumption [T] holds by the designs of experiments. 
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The following monotonicity assumption is widely used for principal stratification analysis (An- 


grist et al. 

1996 

and principal surrogate evaluation (Gilbert and Hudgens 

2008) 


Assumption 2. (Monotonicity). S{(1) > Si( 0) for each individual i. 


Assumption [2] means that the treated surrogate endpoint 5} (1) is always better than or equal 
to the controlled surrogate endpoint S}(0) for every patient i. The monotonicity assumption may 
be too restrictive for some real applications where treatment may have negative side-effects on 
some patients. We shall discuss the identification of PSACEs with and without the nronotonicity 
assumption separately. 

Let 0(z, s) denote the set of principal strata that are compatible with the observed values 
Zi = z and Si = s. Then we have 0(1,1) = {ss,ss}, 0(1,0) = {ss, ss}, 0(0,1) = {ss, ss}, and 
0(0, 0) = {ss, ss} without the monotonicity assumption. The nronotonicity assumption eliminates 
the stratum ss, and therefore 0(1,1) = {ss,ss}, 0(1,0) = {ss}, 0(0,1) = {ss}, and 0(0,0) = 
{ss, ss}. Although the principal stratification variable U is not observable for all units, nronotonicity 
allows us to identify the proportions of principal strata using the distribution of the observed data. 
Define 

TTur = P(U = u I R = r), 

which can be identified by Tr S g,r = P{S = 0 | Z = 1, R = r), 7 t SS jT . = P(S = l\ Z = Q,R = r) and 
Tt S s,r = 1 — rr ss r — TTss,r- With monotonicity, we can also identify the expectations E{Y( 0) | U = 
ss , R = r} = E(Y | Z = 0, S = 1, R = r) and E{Y( 1) \U = ss,R=r} = E(Y \ Z = 1, S = 0, R = 
r). Since patients within the observed group (Z = 1, S = 1) or (Z = 0, S = 0) are both mixtures 
of two latent principal strata, we cannot identify the PSACEs without further assumptions beyond 
Assumptions [T] and [2j We can obtain large sample bounds for the PSACEs from the observed data, 
but these bounds are barely informative as shown in the online Appendix. 


3. Identification of the PSACEs from multiple trials 

Because the principal stratum is unobservable, causal effects within principal strata are not iden¬ 
tifiable in general. In this section, we shall propose approaches to improving the identifiability of 
the PSACEs in terms of multiple trials. To combine information from multiple trials, we make the 
following homogeneity assumption. 

Assumption 3. (Homogeneity). R]\Y(z) | U for z = 0,1. 

The homogeneity assumption means that the potential outcome Y (z) may depend on the prin¬ 
cipal stratum U , but is independent of the trial number R conditional on the principal stratum 
U. Thus, there is no “direct effect” of R on Y(z) within the principal stratum U. We can include 
pretreatment covariates A' to make this assumption more plausible, i.e., R\YY(z) \ (U,X), and we 
omit it for simplicity. As an example, Assumption [3] for the ACCTs data means that the survival 
of a patient under treatment or control does not depend on which trial he/she is in once we know 
his/her principal stratum U , defined by the reoccurrence of cancer under both treatment and con¬ 
trol. In particular, for the patients in stratum U = ss, the cancer will not reoccur within 3 years no 
matter whether they receive the treatment or not. Thus the principal stratum can be interpreted 
as a measure of the physical status of a patient. It may be plausible that the survival will no 
longer depend on the trial number after we know a patient’s physical status and the cancer status. 









Recognizing that Assumption [3] may not be directly testable and may be violated in practice, we 
propose an approach for conducting sensitivity analysis about Assumption [3] in Section [7] 

The following result further explains the homogeneity of causal effects across trials. 

Proposition 1. Under Assumption [7J Assumption^ is equivalent to R]\Y | ( U, Z ), which 
implies ACE ur = ACE ur i for all u = ss, ss, ss, ss and r ^ r'. 

By Proposition |TJ we can write ACE U = ACE ur , and thus we can discuss the identifiability of 
the common PSACEs using the data from multiple trials together. 

To simplify the notation, we define p r = P(R = r) as the proportion of trial r, a r = P(Z = 
1 | R = r) as the proportion of patients receiving treatment in trial r, S zur = P(Y = 1 | Z = 
z, U = u, R = r) as the survival proportion conditional on the treatment z, principal stratum u 
and trial number r. Under the homogeneity assumption, we have that S zu = 8 zur . We further 
define p = {p r : r = 1 a = {a r : r = l,...,N R }, 7iy = {Tr ss , r ,^ss,r,^ss,rU b^}, 

7r = {-ny : r = 1,..., N R }, and S = { 8 ZU : z = 0,1; u = ss, ss, ss, ss}. Under monotonicity, we do 
not have the parameters 8 Z ^ S . and we have 7rj Sjr = 0. 


3.1. Identification with monotonicity 

In addition to the homogeneity assumption, we need the following assumption. 

Assumption 4. (a) There exist at least two trials ry andr 2 , such thatn SStri /Tr S s, ri 7^ 7 r 5 Sjr 2 / 7 r S s, r2 . 
(h) There exist at least two trials r% and r 4 , such that 'K S s,r 3 /'^ss,r 3 7 ^ ^as^i/^ss^i- 

Again, treating the principal stratum as a measure of patients’ physical status, Assumption [4] 
means that patients in different trials have different distributions of the patients’ physical status. 
Under monotonicity, Assumption [4] is testable by data because we can identify the proportions of 
the principal strata under Assumption [2] as shown in Section 2.2. 

The trial number R acts like an instrumental variable associated with U in the sense that 
Assumption 3 is similar to the exclusion restriction assumption, and Assumption 4 guarantees the 
association between R and U. Under Assumptions 3 and 4, there is no “direct effect” of R on 
Y , which is similar to the instrumental variable case where the instrumental variable does not 
“directly” affect the outcome. 

Theorem 1. Under Assumptions [7] to [31 we have that, for u = ss, ss and ss, 

(a) P(Y = 1 | Z = 1, U = u) for all principal strata u are identifiable if Assumption^a) holds; 

(b) P(Y = 1 | Z = 0, U = u) for all principal strata u are identifiable if Assumption^b) holds; 

(c) ACE U for all principal strata u are identifiable if both Assumptions^ a) hold. 

Assumption [4] requires at least two independent trials for the identifiability. We can use any 
two trials satisfying Assumption [4] to obtain moment estimators, and we can also use all trials to 
obtain more efficient maximum likelihood estimates (MLEs). The following example illustrates the 
identifiability for the case with two independent trials (N R = 2) under Assumptions |T] to [2] 
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Example 1. First, the proportions of the principal strata ir ur are identifiable under monotonic¬ 
ity, as discussed in Section 2.2. 

Second, the probability ui ys \ zr = P{Y = y,S = s \ Z = z,R = r) can be identified from 
the observed data. We can directly identify two outcome distributions 6 = Wio|ii/tTss,i and 
^0,ss = w ll|01 l^ss.l- 

Third, if n ss ,i/n s s,i Trss^/Trssp, we have from the proof of Theorem^that 

x w ll|ll ■ n ss,2 ^ll|12 ■ x 11|12 ' ^ss.l — w ll|ll ' 7T SS ,2 

C /2 SS ? ^ 1 S 5 * 

7Tss,l • 7fss,2 ^ss,2 ’ 7Tss,l ’ ^ss ,2 ^ss ,2 * 7Tss,l 

If ^ssp/tTss,! 7 ^ tTssp/tTss^, we have from the proof of Theorem^ that 

x Wio|01 • TTss ,2 — Wio|02 • ^ss.l „ Wio|02 ‘ ^ss,! — ^10|01 ' 77 ss,2 

VQ,SS 5 ^0,SS — • 

7Tss,l ‘ T^ss ,2 ^ss,2 * 'TI’ss,! 7Tss,l * ^ss ,2 ^ss,2 ‘ ^ss,l 


3 . 2 . Identification without monotonicity 

Monotonicity for every individual may be too restrictive in practice. In this subsection, with the 
help of more than two independent trials, we can remove monotonicity. 

By the homogeneity assumption, the probabilities of the observed data can be decomposed as 


P(Z = z,S = s,Y = y | R = r) 

J2 P(Z = z \ R = r) ■ n ur ■ P(Y = y \ Z = z,U = u), (1) 

u£0(z,s) 


for z, s,y = 0,1. Let £ = (p,a,n,6) denote the vector of the parameters. Let / be the vector 
of probabilities on the left-hand side of equation (B.l). Applying a Taylor expansion, we can 
approximate /(£) by linear equations of £ around the true parameters £o, be., /(£) ~ /(£o)+V/ |^ 0 
— £o), where V/ |^ 0 is the Jacobian matrix with the (*, j)-th element dfi/dfj |^ 0 . According to 


Roche et al. (1997), a distribution Fw{w,$,) of a random variable W is locally identifiable at £o if 
there exists some neighborhood N(£ 0 ) of £ 0 such that for all £ £ iV(£ 0 ), F w (w,£ 0 ) = F w (w,£) 
for all w if and only if £ = £o- Therefore, under the randomization and homogeneity assumptions, 
the parameter vector £ can be represented by a function of the distributions of observed variables 
and thus it is locally identifiable if the Jacobian matrix V/ |^ 0 is of full column rank. In fact, 
Assumption [4] is a necessary condition for V/ |^ 0 to be full column rank. In practice, the full rank 
condition of V/ |^ 0 can be tested empirically by the rank of V/ |, where £ is the MLE of £o 


(Goodman 1974 Skrondal and Rabe-Hesketh 20041. 


A necessary condition for local identifiability of unknown parameters is that the number of 
observed frequencies is larger than the number of unknown parameters. For this case, we have the 
following result. 

Proposition 2. Under Assumptions [7] and [3| a necessary condition for the local identifiability 
of the joint distribution of ( Y , Z, S, U, R) is Nr > 3. 


Intuitively, when Nr < 2, there will be less equations than unknown parameters in the equations 
in (B.l), and thus we cannot obtain unique solution of the unknown parameters. Therefore, there 
must be at least three independent trials to identify the PSACEs without monotonicity. In the case 
without monotonicity, we can not obtain closed forms of PSACEs like Example 1, because we are 


















not able to obtain closed forms of 7r ur ’s. We can use a numerical approach to calculate the PS ACEs, 
and we give an example in the online Appendix to illustrate the identifiability for the case without 
monotonicity. 

Without monotonicity, we can prove only local identification. If a parameter is locally identifiable 
but not globally identifiable, its posterior distribution must be multimodal and has the same value 
at multiple modes. Thus, we can check its posterior distribution to verify identifiability. In our 
simulation studies and application, posterior distributions of all the parameters are unimodal, which 
means that the parameters are indeed globally identifiable. In general, we can test whether the 
posterior distributions of parameters are unimodal using existing methods. 


3 . 3 . Computation, model checking, and goodness-of-flt tests 

In this subsection, we shall discuss the estimation of the parameters. We can use the expectation- 
maximization (EM) algorithm and the Gibbs Sampler to compute the MLEs and simulate the 
posterior distributions of PSACEs. Suppose that R follows a categorical distribution with param¬ 
eter p, Z follows Bernoulli distributions conditional on R with parameter a, U follows categorical 
distributions conditional on R with parameter 7r, and Y follows Bernoulli distributions conditional 
on Z and U with parameter 5. The unobserved variable U is treated as a latent variable. The 
complete data can be represented by a contingency table classified by (Z,U,Y,R), and the ob¬ 
served data can be represented by a contingency table classified by (Z, S, Y, R) with cell counts 
N Z sy r = #{* : Zi = z, Si = s,Yi = y,Ri = r}. For Bayesian inference, we use Dirichlet (or Beta) 
distributions with parameters (1,..., 1) as the non-informative prior distributions of p, a., n and 
6 . Computational details are given in the online Appendix. 

More interestingly, when Nr > 2 with monotonicity or Nr > 3 for without monotonicity, 
the extra degrees of freedom allow us to perform goodness-of-fit tests based on the asymptotic 
distributions of the likelihood ratio statistics: 


T m (N zsyr ) = 2 log 


Tnm(N zsyr ) = 2 log 


Y(Cs | A zsyr) t 

L (Cm | N zsyr ) 
L(C I N zsyr ) 


X4JVb-6) 


L{£ 


nm | A- S y r ) 


X 3 Nr — 8 7 


where Cs is the MLE under the saturated model, i.e., (Z, S, R, Y) follows the multinomial distribu¬ 
tion without constraints on parameters, and and £ nm are the MLEs with and without monotonic¬ 
ity. The degrees of freedom with and without monotonicity are (8 Nr — 1) — (4 Nr + 5) = (4 Nr — 6) 
and (8 Nr — 1) — (5 Nr + 7) = (3 Nr — 8), respectively. On the other hand, we can use Bayesian 

denote 


posterior predictive p -values (ppp) for model checking (Rubin, 1984 Meng 
the replicate data generated from their posterior predictive distributions, 
with and without monotonicity are 


1994). Let 
The ppp’s for the models 


PPPm = P{Tm{N zsyr ) > I N zsyr}, 

PPPnm = P\Tnm{N Z syr) ^ Prim (■ K?yr) I N z syr} 5 


which can be approximated by the posterior draws from the Gibbs Sampler. 
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4. Evaluation of surrogate based on principal stratification 

In the previous section, we proposed approaches to identifying the PSACEs using multiple trials. 
In this section, our goal is to predict the treatment effect on the endpoint both quantitatively and 
qualitatively, based on the treatment effect on the surrogate in a new trial without observing the 
endpoint. We consider two cases for applying the principal surrogate to a new trial in which the 
endpoint is not observed. 


Case 1. The surrogate is applied to a new population where the treatment is the same, but the 
distribution of principal strata differs from the validation trials. For example, in validation 
trials, we study the effect of drug A on cardiovascular disease, and use an indicator for 
whether the level of cholesterol increases as a candidate surrogate. Based on the PSACEs 
estimated from validation trials, we use the surrogate to predict the effect of the drug A 
on the disease of patients in a different population. 

Case 2. The surrogate is applied to a new population where both the treatment and the distribution 
of principal strata may be different from those in the validation trials. For example, based 
on the PSACEs for drug A estimated from validation trials, we use the surrogate to predict 
the effect of a new drug B on the disease of patients in a different population. 


For quantitative evaluation, we assume that the PSACEs are the same across the validation 
trials and the new trial, then try to calculate the treatment effect on the endpoint. For qualitative 
evaluation, we relax this assumption so that the PSACEs have the same signs across the validation 
trials and the new trial. Then we try to get the sign of the the treatment effect on the endpoint. 


VanderWeele (2013) points out that a principal surrogate satisfying the causal necessity ( ACE U = 


0 for it = ss, ss) and the causal sufficiency ( ACE U 0 for u = ss, ss) may not avoid the surrogate 
paradox. In the following example, we further illustrate that a principal surrogate verified in a val¬ 
idation trial may not be used to correctly evaluate the treatment effect on the unobserved endpoint 
in a new trial, even if the new trial and the validation trials have the same PSACEs. 


Example 2. Consider a surrogate S which is evaluated by validation trial r. The surrogate 
satisfies the causal necessity with ACE SS = ACE S g = 0, and the causal sufficiency with ACE sS = 0.4 
and ACE S s = —0.6. Suppose that the distribution of principal strata in the validation trial is 
7 r ss r = TT S s,r = 0.2, 7T sS>r = 0.4 and TTs S ,r = 0.2. Then we have E{S(1) — S(0) \ R = r} = 0.2 > 0 and 
E{y(l) — E(0) | R = r} = 0.04 > 0, which means that the surrogate paradox is avoided in validation 
trial r. Now suppose that a new trial R = r' has the same PSACEs as the validation trial. But 
the distribution of principal strata differs: n ss y = 0.1, nggy = 0.2, n sS y = 0.4, and 7 T Ss y = 0.3. 
Then we have E{S{ 1) - 5(0) | R = r'} = 0.1 > 0 but E{Y{ 1) - F(0) | R = r'} = -0.02 < 0. The 
surrogate paradox arises, and thus we cannot use the surrogate S to correctly evaluate the treatment 
effect on the endpoint Y in the new trial. 

Now we discuss the conditions to avoid the surrogate paradox when a validated surrogate is 
applied to a new trial with a new treatment or a new population. Let ACEf = E{S{ 1) — 5(0) | 
R = r} and ACE) = E{F(1) — T(0) | R = r}. Without loss of generality, we assume ACEff > 0. 
For the case with ACEf < 0, we can redefine the surrogate as 5* = 1 — 5. Below we give the 
relationships among the average causal effects ACE^ ,ACEf and ACE ur for trial r. 

Proposition 3. For trial r, we assume that the surrogate satisfies the causal necessity. 





(i) With monotonicity, we have ACE? = ACE? x ACE sS} r- 

(ii) Without monotonicitu, suppose ACE? > 0, we have the lower and upper bounds of ACE?: 
ifACE sS , r + ACE SStr > 0, 

ACE? x ACE^r < ACE? < (. ACE sS ,r + ACE Ss , r )/2 + ACE? x (ACE aS , r - ACE Sa , r )/2, 
and otherwise 

(. ACE aSt r + ACE Ss ,r)/ 2 + ACE ? x (ACE aStr - ACE Ss , r )/ 2 < ACE? < ACE? x ACE S - S ,r- 

Proposition [3] shows the relationships between the treatment effect on the surrogate and the 
treatment effect on the endpoint, which immediately give us the following implication relationships 
when the endpoint Y is unobservable. 

Corollary 1. For trial r, we assume that the surroqate satisfies the causal necessity and 
ACE S s,r >0. 

(i) With monotonicity, ACE? > 0 (or = 0) implies ACE ? >0 (or = 0). 

(ii) Without monotonicity, ACE sS:r + ACE Ss , r > 0 and ACE? > 0 imply ACE? > 0. 

In Corollary [Ijfz), with monotonicity, a principal surrogate satisfying the causal sufficiency can 
avoid the surrogate paradox. Without monotonicity, the above result (i) does not hold since the 
causal effects on the endpoint Y may be different in the principal strata u = ss and ss. Thus, we 
require the condition ACE s s, r + ACE Ss , r > 0 in (ii) of Corollary [lj i.e., the positive causal effect 
on the endpoint in stratum ss can offset the negative causal effect on the endpoint in stratum ss. 

Proposision [3] and Corollary [l] can help us to assess the treatment effect on the endpoint in a 
new trial both quantitatively and qualitatively. Below we illustrate this with Examples 3 and 4 for 
Cases 1 and 2 respectively. 

Example 3. Suppose that in the validation trial, we studied drug A and obtained estimates 
ACE SS}T = ACE SSt r = 0, ACE aSjr = 0.5 and ACEg S ^ r = —0.4. For Case 1, we are aiming to 

evaluate the effect of the same drug on the disease for a different population in a new trial r'. In 

trial r', we obtain ^{^(l) | R = r'} = 0.6 and ^{S'(O) | R = r'} = 0.4. If we assume that the 
PS ACEs in the new trial are the same as those in the validation trial and monotonicity holds in 
the new trial, Proposition^fi) implies that 

ACE? = ACE? x ACE ssy = ACE? x ACE sS , r = (0.6 - 0.4) x 0.5 = 0.10 > 0. 

If monotonicity fails in the new trial, we can still calculate the bounds for ACE? according to 
Proposition [3]( ii): 

ACE? < ( ACE sS y + ACE- ss y)/ 2 + ACE? x ( ACE sS y - ACE- ss y )/2 

= (0.5-0.4)/2 +(0.6-0.4) x (0.5 +0.4)/2 = 0.14, (2) 

ACE? > ACE? x ACEsgy = (0.6 - 0.4) x 0.5 = 0.10. (3) 

If we only assume that ACE ss y = ACEssy = 0 and ACE S sy + ACE Ss y > 0 in the new trial as 
in the validation trial, then ACE?, = 0.2 > 0 and ACE s gy = 0.5 > 0 imply ACE?, > 0, according 
to Corollary^ with or without monotonicity. 
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Example 4. Suppose that in the validation trial, we studied drug A and estimated ACE ss ^ r = 
ACEss, r = 0, ACE aS , r = 0.5 and ACE Ss , r = —0.4. For Case 2, we are aiming to evaluate the 
effect of another drug B on the disease for the population in a new trial r'. Unlike Case 1, we need 
to define the new drug as Z', the principal stratification as U' = (. S(Z' = 1 ),S(Z' = 0)), and the 
PSACEs as ACE ur i = E{Y(Z' = 1) — Y(Z' = 0) | U' = it, R = r'}. In the new trial, we obtain 
EjS^l) | R = r'} = 0.6 and E{5(0) | R = r'} = 0.4. Similar to Example 3, if we assume that the 
PSACEs in the new trial are the same as those in the validation trial and monotonicity holds in 
the new trial, we can calculate ACEy = 0.1. If monotonicity fails, we can obtain the bounds Q) 
and Q) for ACE. If we assume ACE ss y = ACE ss y = 0 and ACE sS y + ACE Ss y > 0 as in 
Example [$| we can deduce that ACEy > 0 according to Cored/art/ [ 7 ] 

Assuming the same values of PSACEs enables us to quantitatively evaluate the surrogate, and 
only assuming the same signs of PSACEs still allows us for qualitative evaluation. Note that we 
give only the sufficient conditions to evaluate the surrogate in a new trial. The plausibilities of 
these assumptions depend on subject knowledge and experts’ opinions. 

5. Simulation studies 

In this section, we conduct simulation studies to evaluate finite sample performances of our proposed 
approaches, under both correctly specified and misspecified models. 


5.1. Estimation under the models with homogeneity 

We show the simulation results of the MLEs and the credible intervals of the PSACEs with and 
without monotonicity. The 95% credible intervals are obtained by the Gibbs Sampler with 20,000 
iterations and the first 4, 000 iterations as the burn-in period. We repeat 200 times to get the 
coverage proportions of the true PSACEs for each setting. We use three different sample sizes, and 
let N denote the average sample size for each trial, i.e., N = total sample size/N r. In Figures 1(a) 


and |l(b)| we show the biases and root mean squared errors (RMSEs) of the MLEs and the coverage 
proportions of the posterior credible intervals only for ACE SS . The results for other principal strata 
are similar and shown in the online Appendix. 

First, with monotonicity, we generate R from a categorical distribution with p r = I/Nr for all r. 
We generate Z from Bernoulli distributions conditional on R with different conditional probabilities, 
ol , allowing for different treatment assignment probabilities for different trials. We generate U from 
categorical distributions conditional on R with probabilities (7r ss>r , 7r s g jT ., 7r S s, r ) for trial r. In order 
to satisfy Assumption |4j we choose U \ R = r to depend on R = r. We generate three scenarios 
with Nr = 2,3 and 5, and present the true values of (7r SSjr , Tr S sy, TTssy) and a in the upper panel 
of Table [l] We generate Y from Bernoulli distributions with the following conditional probabilities 
given Z and U: 

S 0 ,ss , 5i,,s, <W, * 1 , 8 *. 6o,ss) = (0-8,0.5,0.7,0.3,0.6,0.1), 

with PSACEs ACE as = 0.3, ACE sS = 0.4, a nd A CE SS = 0.5. 

We show the simulation results in Figure [T(a)| where the numbers “2,” “3” and “5” denote the 
numbers of trials. The biases are small for all scenarios, the RMSEs decrease as the sample size 
and the number of trials increase, and the coverage proportions are close to 95% for all scenarios. 
Comparing the 6th point with the 7th point in the second subplot of Figure l(a)| we can see that the 










Table 1 . True parameters in simulations. We present the values of n ur = P(U — r\R = r) 
and a r = P(Z = 1 | R = r) in each scenario. The upper panel shows the true parameters 
with monotonicity, and the lower panel shows the true parameters without monotonicity. 



Nr = 2 


With monotonicity 

Nr = 3 


Nr =5 



(7Tss,r j 7T ss,ri ^ss,r ) 

a r 

{^ss,r j 7Tss,r? 7Tss,r) 

a r 

(tT ss,r 5 7 Tss,r 5 ^ ss,r ) 

a r 

r = 1 

(0.7, 0.2, 0.1) 

0.4 

(0.8, 0.1, 0.1) 

0.4 

(0.8,0.1,0.1) 

0.3 

r = 2 

(0.1,0.2,0.7) 

0.6 

(0.1,0.8,0.1) 

0.5 

(0.6, 0.3,0.1) 

0.4 

r = 3 



(0.1, 0.1, 0.8) 

0.6 

(0.3, 0.2, 0.5) 

0.5 

r = 4 





(0.1, 0.3,0.6) 

0.6 

r = 5 





(0.1,0.1,0.8) 

0.7 




Without monotonicity 





Nr =3 


Nr =4 


Nr =5 



(^ss,rj ?Tss,r j 7T ss,r ? 7Tss,r) 

a r 

(7Tss,r5 7T S s^r^ 7Tss,r j 7Tss,r) 

a r 

(7Tss,rj 7Tss,r5 'Kss,n 7Tss,r) 

OL r 

r = 1 

(0.6, 0.2, 0.1, 0.1) 

0.4 

(0.6,0.2,0.1,0.1) 

0.4 

(0.6, 0.2, 0.1, 0.1) 

0.3 

r = 2 

(0.1,0.6,0.2,0.1) 

0.5 

(0.1,0.6, 0.2,0.1) 

0.5 

(0.1, 0.6,0.2, 0.1) 

0.4 

r = 3 

(0.1, 0.1, 0.6, 0.2) 

0.6 

(0.1,0.1,0.6,0.2) 

0.6 

(0.3,0.2,0.3,0.2) 

0.5 

r = 4 



(0.2,0.3,0.2,0.3) 

0.7 

(0.4, 0.1, 0.4, 0.1) 

0.6 

r = 5 





(0.1,0.1,0.6,0.2) 

0.7 


RMSE of the case with (Nr = 2 , N = 500) is smaller than that of the case with (Nr = 5 ,N = 200). 
The total sample sizes for these two cases are the same, while the case of the 6th point has more 
parameters (more 7r ur .’s with more trials). The simulation is consistent with our intuition that more 
parameters will result in a larger RMSE. 

Next, without monotonicity, we generate R from a categorical distribution with p r = 1/Nr for 
all r. We generate Z from Bernoulli distributions with conditional probabilities ct, and U from 
categorical distributions both conditional on R with probabilities (7 t SS)T ., ^ss,ri ^ss,rt ^ss,r) for trial 
r. For this case without monotonicity, the necessary condition for identifiability is Nr > 3. Thus we 
generate three scenarios with Nr = 3,4 and 5, and present the true values of (ir ss , ri 7r s g jr , ir S s,r, 7r Ss , r ) 
and a in the lower panel of Table [T| We generate Y from Bernoulli distributions with conditional 
probabilities given U and Z as 


(^1,SS: ^0 ,SS 1 ^1 ,ss: ^0,ss; ^l,ss; <^0,SSj ^l,ss? ^0,ss) 


(0.8,0.5,0.7,0.3,0.6,0.1,0.5,0.2), 


with PSACEs ACE SS = 0.3, ACE sS = 0.4, A CE SS = 0.5, and ACE Ss = 0.3. 

We show the simulation results in Figure [T(b)| where “3,” “4” and “5” denote the numbers of 
trials. The biases and the coverage proportions are similar to the cases with monotonicity. But the 
RMSEs become a little larger than those with monotonicity. The performances of our approaches 
are quite promising for the sample size N = 500 x Nr, comparable to the ACCTs data set in 
Section [6l 


5.2. Simulations under the models without homogeneity 

We conduct simulation studies when the homogeneity assumption is false with 5 zur dependent on 
r. Let d be a measure of heterogeneity. We simulate three cases for different levels of heterogeneity 
with d = 0.01,0.025 and 0.05. For each case, we conduct simulation studies when Nr = 3 both 
with and without monotonicity. 

With monotonicity, we first generate (Z, U) the same as the scenario with Nr = 3 in the 
upper panel of Table [l] To allow heterogeneity, we let S zu i = p zu — (— l) z d, 5 ZU 2 = p zu and 
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5 ZU 3 = p zu + (—1 ) z d, with S zur varying with r. The corresponding mean vector is 

{pi,ssi Po.ss ? Pi ,ss> Po,ssi Pi,ssi POTSs') (0.8,0.5,0.T, 0.3,0.6,0.1). 

Then we have ACE U i = p lu - p 0 u + 2 d, ACE u2 = p\ u - Pou and ACE U3 = p lu - p 0u - 2 d. A 
larger d results in a larger violation of the homogeneity assumption. 

Similarly, without monotonicity, we first generate ( Z, U ) the same as the scenario with Nr = 3 
in the lower panel of Table[l] Then we let 5 zul = p zu - (-1 ) z d, 5 zu2 = p zu and S zu3 = p zu + (-1 ) z d, 
with the corresponding mean vector 

( Pi,ss-> Po,ss , Pi,ss: Po,ssj Pi,sin Po,ss: Pi,ssi Mo,ss) = (0-8,0.5,0.7,0.3,0.6,0.1,0.5,0.2). 


Then we have ACE ul = p lu - p 0u + 2d, ACE u2 = pi u - po u , and ACE u3 = p lu - p 0u - 2d. 

For all the evaluations, we use the mean parameters p zu as the “true parameters.” Figures 
1(c) and |l(d)l show the results for ACE SS including the biases, the RMSEs of the MLEs and the 
coverage proportions of the posterior credible intervals, where “1,” “2” and “3” correspond to the 
cases of d = 0.01, 0.025 and 0.05, respectively. 

Comparing Figures 1(a) and |l(bj| under the homogeneity assumption with Figures 1(c) and 
|l(d)| without the homogeneity assumption, we find that the biases increase as d increases, and do 
not decrease as the sample size grows. The RMSEs increase and the coverage proportions of 95% 
credible intervals decrease as d increases, especially for large sample sizes. This means that the 
point and interval estimates are sensitive to the heterogeneity of trials. Thus, we will propose an 
approach for sensitivity analysis, and apply it to our real application in Section 7. 


6. Application to the ACCTs data 


For the ACCTs, the OS with five-year follow-up (survival status within 5 years) is used as the 
endpoint in most earlier papers to evaluate a particular treatment regimen. However, this endpoint 
requires five-year follow-up. The goal of the ACCTs study is to explore whether DFS with three- 
year follow-up (cancer reoccurred before 3 years) is a valid surrogate for the OS with five-year 
follow-up. The data contain more than 20, 000 patients and 18 randomized III clinical trials. The 
period of trial enrollment spans from 1977 to 1999. The data of 10 of the randomized trials are 


available from Baker et al. (2012). Baker et al. (2012) transformed the survival data into counts for 


binary outcomes. As a result, in each trial, we have a contingency table of observed frequencies with 
three variables: the treatment (Z), DFS (S) and OS (Y). To illustrate our approaches, we assume 
all the trials have the same treatment and control, although we do not know the exact treatment 
and control in each trial. This may influence the plausibility of the homogeneity assumption, and 
thus we will conduct a sensitivity analysis in Section 7. We show all the bound analysis in the 
online Appendix, and find that the bounds are barely informative. In this section, we apply the 
proposed approaches to the ACCTs data to obtain point identifications of the treatment effects on 
the endpoint OS within principal strata defined by the potential DFS. We postpone the sensitivity 
analysis to Section [ 7 ] 


6.1. Results with monotonicity 

We use the Gibbs Sampler to simulate the posterior distributions of the PSACEs, because it is 
very direct to obtain posterior credible intervals for the PSACEs from draws of the Gibbs Sampler. 


















Bias 


RMSE 


Coverage 





(a) Correctly specified model with monotonicity 


Bias 


RMSE 


Coverage 




Bias RMSE Coverage 




Bias RMSE Coverage 





(d) Misspecified model without monotonicity 


Fig. 1. Simulations for homogeneity (a) and (b) and heterogeneity (c) and (d). Each subgraph presents the 
bias, RMSE, coverage proportions of the 95% credible intervals of ACE SS . Nine combinations of N R and N 
are shown in (a) and (b); and nine combinations of N and d (“1” for .01, “2” for .025, “3” for .05) are shown in 
(c) and (d). 
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After 100,000 iterations with 50,000 used as burn-in, the Markov chains converge very well with 
Gelman- Rubin diagnostic statistics approximately equal to 1 from five independent chains. Figure 
|2 (a) | shows the histograms of the posterior draws of the PSACEs under monotonicity. The posterior 
median of ACE SS , —0.008, is small, and its 95% credible interval covers zero. Although the posterior 
median of ACE SS , 0.032, is also small, its 95% credible interval does not cover zero. If we would 
like to believe monotonicity, there exists a “principal stratification direct effect” of the treatment 
on the endpoint for stratum ss but not for stratum ss. and thus the causal necessity is not well 
satisfied. Because ACE sS is significantly nonzero with posterior median 0.583 and 95% credible 
interval (0.235,0.787), and the treatment is effective on the surrogate with posterior medians of 
ACEf greater than 0.022 in all trials, we can conclude that it is also effective on the endpoint 
according to Corollary [l] (i). 


6.2. Results without monotonicity 

Figure [2(b)| shows the histograms of posterior draws of the PSACEs without monotonicity. First, 
ACE SS and ACE SS are very close to zero with posterior medians 0.014 and 0.001 and 95% credible 
intervals covering zero. When the treatment does not affect the DFS with three-year follow-up, 
it will not affect the OS with five-year follow-up. Second, ACE sS and ACE Ss are significantly 
different from zero with posterior medians 0.774 and —0.750 and 95% credible intervals excluding 
zero. Therefore, when the treatment affects the DFS with three-year follow-up, it will also affect 
the OS with five-year follow-up. The results without monotonicity differ from the results with 
monotonicity in the posterior distribution of ACE SS1 which give us different interpretations about 
the causal mechanism. From a practical point of view, we need to verify which model is more 
plausible based on the observed data. 


6.3. Model comparison and checking 

We use the methods described in Section |X3| to perform model comparison. The ppp’s are 0.039 
and 0.470 under the models with and without nronotonicity, respectively. Analogously, the p-values 
from the likelihood ratio tests are 0.015 and 0.140 under the models with and without monotonicity, 
respectively. Therefore, the model with monotonicity is not compatible with the observed data, and 
it is rejected by both frequentists’ and Bayesian methods. Furthermore, the lower right subplot of 
Figure 2(c) shows that the posterior medians and 95% credible intervals of 7r SSir under the model 
without monotonicity are far from zero in many trials. On the contrary, the large p -values from both 
the likelihood ratio test and the posterior predictive check indicate very good fit of the model without 
monotonicity to the observed data. Although previous methods often assumed nronotonicity, our 
tests reject it. Therefore, our methods can be used in a broader scope of applications and are more 
credible when monotonicity fails. 


6.4. Evaluation of principal surrogate in the ACCTs data 

According to the discussion in Section 5 about model checking, we believe that the model without 
monotonicity is more credible, and thus all our discussion below is based on the results in Section 
|6.2| from the model without monotonicity. 

In Section 6.2 we obtain that the posterior medians of ACE SS and ACE SS are very close to 
zero, and both of their 95% credible intervals cover zero, which means that the candidate surro¬ 
gate satisfies the causal necessity very well. Therefore, our analysis verifies the causal necessity 
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(c) Posterior distributions of the principal strata without monotonicity. 


Fig. 2. The solid lines are the posterior medians, and the dotted lines are the posterior 2.5% and 97.5% 
quantiles. 
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assumption in Baker et al. (2012). In addition, the posterior medians of ACE sS and ACE Ss are 
0.772 and —0.748, respectively, with both of their 95% credible intervals excluding zero, show¬ 
ing that the surrogate also satisfies the causal sufficiency very well. We then have approximately 
ACE SS = 0 , ACEs S = 0 ,ACE sS > 0, and ACE sS + ACE Ss > 0. Thus, according to Corollary 1 (ii), 
the surrogate paradox can be avoided by using this surrogate, i.e., if the treatment has a positive 
causal effect on the surrogate, it must have a positive causal effect on the endpoint. In fact, the 
treatment has a positive average causal effect on the surrogate (posterior medians of ACE ® are 
greater than 0.015 in all trials), and we can conclude that it has a positive average causal effect on 
the endpoint. 


7. Sensitivity analysis without homogeneity 

The homogeneity assumption is crucial for the identifiability of PSACEs. It may be violated if 
different trials with different environments may affect the endpoint. Instead of assuming that 8 zur 
does not depend on r, we propose a Bayesian hierarchical model to account for the heterogeneity 
among S zur for different trials. We keep the conditional distributions of P(R ), P(Z \ R) and 
P(U | R) unchanged, but assume the following hierarchical model for P(Y \ Z, U, R): 

Y\Z = z,U = u, R = r ~ Bernoulli ((5 2 „ r ), 
logit (<5 

zur ) ~ N(p 

zm & ) • 

In this model, deviation from the homogeneity assumption is characterized by the sensitivity pa¬ 
rameter cr. When cr = 0, S zur = S zu and thus the homogeneity assumption holds. When o > 0, the 
homogeneity assumption is violated. For example, when p zu = logit(0.3) and cr = 0.5, the parame¬ 
ter S zur falls in the interval (0.139,0.533) with probability 0.95; when p zu = logit(0.5) and cr = 0.5, 
the parameter 5 zur falls in the interval (0.273,0.727) with probability 0.95. Since the parameter 
S zur is within the interval [0,1], the above intervals imply quite large deviations away from the 
homogeneity assumption. Therefore, we choose 0.5 as the maximum value of the sensitivity param¬ 
eter cr, and choose 0.05 and 0.2 as two moderate values of cr. In our Bayesian analysis, we choose 
the following priors: {P{Z = 1 | R = r) : r = 1, ..., Nr} ~ Dirichlet(l, • • ■ ,1), P{Z = 1 | R = r) 
~ 1/(0,1), {7T SSjr ,7T S s,r,'7rss,7-,7rss,r} ~ Dirichlet(l, ■ ■ • ,1), and p zu ~ U(—5,5). We use the prior of 
p zu for numerical stability, and also [logit^ -1 - ) (—5), logit < ~ 1 ' ) (5)] = [0.007,0.993] ~ [0,1] is not very 
restrictive. The details of the Gibbs Sampler for the Bayesian hierarchical model are given in the 
online Appendix. 

We reanalyze the ACCTs data using the Bayesian hierarchical models with results shown in 
Figure [3j Four rows correspond to the four principal strata (U = ss, ss, ss, ss), and three columns 
correspond to the three values of cr (cr = 0.05,0.2,0.5). For each subgraph, we draw the 2.5%, 
50% and 97.5% posterior quantiles of PSACEs obtained from both the homogeneity model and 
Bayesian hierarchical models. The posterior distributions of PSACEs are robust for these different 
values of the sensitivity parameter cr, although the intervals become wider as cr increases. The signs 
of ACE sS and ACE Ss remain positive and negative respectively, and the intervals of ACE SS and 
ACE S s still cover zero. The results obtained by Bayesian hierarchical models remain unchanged by 
removing the homogeneity assumption, which show the robustness of our results in Section [6] 
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Fig. 3. Sensitivity analysis for the ACCTs data. The rows correspond to the four principal strata and the 
columns correspond to the three different values of a. The dashed lines are the posterior 2.5%, 50% and 
97.5% quantiles from Bayesian hierarchical models, and the solid grey lines are the posterior quantiles from 
the homogeneity model. 
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8. Discussion 


According to Frangakis and Rubin (2002), evaluation of surrogates requires identification of the 


causal effects within principal strata. Because the potential outcomes of the surrogate endpoints 
under treatment and control cannot be jointly observed, the value of the principal stratification 
variable is missing. The monotonicity assumption excludes all the individuals with surrogate nega¬ 
tively affected by the treatment, which may not be biologically reasonable. The exclusion restriction 
assumption cannot be made when evaluating surrogates, because it implies the causal necessity and 
the validity of the causal necessity is a scientific question of interest. Unfortunately, the identi- 
fiability of the PSACEs are jeopardized without monotonicity or exclusion restriction. Although 
Bayesian analysis for weakly identified models still yields proper posterior distributions under proper 
priors, it may result in sensitive answers to the prior specification. 

The ACCTs data contain multiple independent trials for evaluating the same surrogate for 
the same endpoint. If we could make the homogeneity assumption across multiple trials, we can 
remove the monotonicity and exclusion restriction assumptions but still guarantee identifiability 
of the PSACEs. We find in the ACCTs data that both the “causal necessity” and the “causal 
sufficiency” hold, which imply that the DFS with three-year follow-up is a valid surrogate for OS 
with five-year follow-up. We investigate the applicability of the principal surrogate in new trials 
on new populations or with new treatments, and show that the surrogate in the ACCTs data 
could be a useful surrogate under some conditions. To remove the homogeneity assumption, we 
further propose an approach based on Bayesian hierarchical models, and investigate the sensitivity 
to the deviations from the homogeneity assumption. Within a reasonable range of the sensitivity 
parameter, the conclusions for the ACCTs data remain stable. 

The framework we proposed could be applied to various settings involving post-treatment inter¬ 
mediate variables where the monotonicity and exclusion restriction assumptions are questionable. 
For example, in randomized experiments with non-compliance, the assignment of the treatment 
may influence the outcome directly. In these cases, we should concern the plausibility of exclusion 
restriction assumptions versus the plausibility of homogeneity assumptions. Models with the homo¬ 
geneity assumption may sometimes be more flexible in practice. With the homogeneity assumption, 
we can test the monotonicity assumption. Furthermore, even if the homogeneity assumption does 
not hold, we can still conduct a sensitivity analysis as in Section [ 7 ] 

Follmann (20061, Qin et al. (2008) and Mattei and Mealli| (|2011|) proposed augmented designs 


for single trials to identify principal stratification causal effects using additional variables. From the 


perspective of experimental design, our approach can be viewed as an extension of Follmann (2006) 


which imposes a stronger monotonicity assumption for the treatment effect on the intermediate 
variable. In our approaches, the number of trials R can be more general, as long as it satisfies the 
assumptions for identification. We can design experiments to create a variable R, satisfying these 
assumptions in order to identify the PSACEs. For example, in an encouragement experiment that 
encourages the patients to take their assignments, we can use different types of encouragement for 
different groups of people. Thus different groups will have different compliance behaviors, and the 
indicator for encouragement types then acts like R in our paper. 

Several generalizations are possible. First, it is relatively easy to deal with a continuous endpoint 


because we can identify the distributional causal effect by dichotomizing the endpoint (Ding et al. 


2011). However, it is non-trivial to deal with a continuous surrogate endpoint since the causal 


necessity and sufficiency cannot be defined by dichotomizing it. Schwartz et al. (2011) set the stage 


for analyzing principal strata effects with continuous intermediate variables using a semiparametric 

























Bayesian approach. Second, we discuss the methods and present an application without missing 
data, and dealing with the missing data problem will be of interest in many real applications. 
Finally, generalizing our approaches to longitudinal data and time-to-event data is also of theoretical 
and practical interest. 
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Supplementary Materials 

Appendix A derives the bounds for the PSACEs both with and without monotonicity, and shows the 
application of these bounds to the ACCTs data. Appendix B provides the proofs for the propositions 
and theorems. Appendix C gives the computational details. Appendix D provides more details about the 
simulation studies. 

We re-introduce all the notation used in the main text as follows: 


Pzsr = P(S = s | Z = z,R = r), 

^Vys\zr P{Y — y, S — s | Z — z,R — r), 
a r = P(Z = 1 | R = r), 

Szur = P(Y = 1 Z = z.U — u, U - r). 

Under the homogeneity assumption, we have: 


Qzsr = P(Y = 1 I Z = z, S = s, R = r), 

Pr = P{R = r), 

TTur = P(U = u\ R = r), 

ACE ur = E{Y{ 1) - y(0) | U = u,R = r}. 

ACE U = E{Y( 1) - Y(0) | U*u} = ACE ur . 

1^7" . 7* — 1, . . . , VI r — TTss,!", 

0,1 ; u = ss, ss, ss, ss}. 


Szu = P(Y = 1 | Z = z,U = u) = 5 zur , 

We further define p = {p r : r = 1,. .., Nr}, a. 
n = { 7 r r : r = 1,.. -,Nr}, and 8 = {S zu ■ z = 


Appendix A: Bounds for PSACEs 

We need the following lemma to simplify the derivations of the bounds. 

Lemma 1. Let Xo be a mixture of two Bernoulli distributions A'i and X 2 with Xq 
a)X 2 ,X-t ~ Bernoulli(pi ), and a known mixing proportion a. Then we have 


max ^0,1 — 
max I 0,1 — 


f ~Po 
a 

WwPo 
1 — a 


^ < pi < min (l, , 


< P 2 < min I 1 


Po 

1 — a 


Proof of Lemma 1. See Cheng and Small (2006). 


aA'i + (1 — 


□ 


Appendix A.l: Bounds without monotonicity 

Let \Jpper nm (ACE ur ) and Lower n m(ACE ur ) denote the upper and lower bounds of ACE ur without mono¬ 
tonicity, respectively. 

Proposition A.4. Without monotonicity, the bounds of ACE SSir are 

QllrPll 


Upper nrn (ACE ss , r ) = min ^ 1, 


Poir PlOr 


i * j (1 Qoir)Poir 

+ mm i —5 - p - 1 

-M)lr — cio r 


r t Ar’TT \ In I (1 — Qllr)AlT- \ . f 1 QoirPoi 1 

Lower nrn (AC E SS:r ) = max } 0,1----- -mm 1, —--— >, 

[ To 1 r ~ TlOr J [_ To 1 r — TlOr J 

ZE S s, r are 

1 f„ QoOrPoOr ~ PlOr 1 

r n “i 0 ’ n,r-n,r }’ 


if Poir > PlOr) and [—1,1] if Poir < Pior • The bounds of ACE S s,r are 

QllrPll 


Upper nrn (ACE S s, r ) = min<M, 


Lower n m(ACE S s,r) = max < 0 


Pllr Poir 
QllrPllr 
-Pllr Poir 


— 1 > — min < 1, 


QoOrPoOr 
Pllr Poir 

















if Poir < Pllr, and [— 1 , 1 ] if Poir > Pllr- The bounds of AC Ess,r 


are: 


Upper nrn (ACEss,r) = minjl, 
Lower n m(ACEss,r) = max < 0, 


QlOrP 10 r 

PlOr -^Olr 

QlOrPlOr Poir 

PlOr Pdr 

if Poir < Pior, and [—1,1] if Poir > PlOr- The bounds of AC Ess, r are: 



Upper nrn (ACEs S ,r) = minjl, 
Lower n m{ACEss,r ) = max|o, 

if Poir > Piir, and [—1,1] if Poir < Piir- 


Q lOrPlOr 

Poir Pllr 
QlOrPlOr PoOr 

POI r Pi 1 r 


— max < 0 


— min < 1 


QoOrPoOr Pllr 
PlOr Pdr 
QoOrPoOr 
PlOr Poir 


QoirPoir Pllr 
Pdr Pllr 
QoirPoir 


Poir — Pllr 


Proof of Proposition A.4 Without monotonicity, each of the subpopulation with (Z = z, S = s) is a mixture 
of two latent principal strata, implying that 



7I"ss,r + 7Tss,r 

= -Poor, 

(A.4) 


7I"ss,r H - 7Tss,r 

Pdr? 

(A.5) 


7^ss,r H - 7Tss,r 

PlOr? 

(A.6) 


7I"ss,r + 7Tss,r 

= Pllr, 

(A.7) 


7Tss,r + 7Tss,r 7Tss,r H - 7Tss,r 

= 1, 

(A.8) 


r 7Tss,r . ^"ss,r 

Ol,ss,r D \ 0±,ss,r D 

Pllr Pllr 

— Qllr, 

(A.9) 


r 7Tss,r . 7rss,r 

0O,ss,r D r O0,ss,r D 

Pdr Poir 

— Qoir* 

(A.10) 

know 1Tss,r 

in Equations (A.4| and (A.81, 

we can express 7r„r’s in terms of 7Ts Si r 


7T ss,r 

— Poir 7I"ss,r, 


(A.ll) 

^ss,r 

PlOr 7T ss,r ? 


(A.12) 

7^ss,r 

— 1 (7Tss,r + 7Tss,r H - 7Tss,r) 

= Pllr Pdr + 7Tss,r* 

(A.13) 


Since 0 < 7r ur < 1, we can obtain the bounds for 7Vs S , r from Equations ( |A.ll ) to (A. 13): 

7Tss,r G [max(0, Poir - Pllr), min(Poir, PlOr)] = 7Z r . 

From Lemmaand Equations (A.9) and (A. To]), we can get the bounds of ACE ss , r for a given 7rs S , r : 

Upper : nrn {ACE ss,r | 7Tss,r) 


= minjl, 
= min < 1, 


— max < 0, 


QllrPllr 
Pdr TTss 
QllrPllr 
Pdr TTss 


Lower nrn (ACE ss ,r | 7 Tss,r) 

QllrPllr Pllr Poir 7Tss,r 
Poir 7Tss,r 
(1 — Qllr)Pll 


QoirPoir 7Tss,r 
Poir 7Tss,r 


i • In Qoir)Poir n 

+ mm < 0, ---1 


Poir 7I"ss,r 


max jO 
max < 0,1 — 


— min < 1 


QoirPoir 
Pdr TTss 


Poir TTs 


— min < 1, 


QoirPoir 
Poir TTss 
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The bounds for ACE SStr can be obtained by maximizing or minimizing the above bounds over 71 
feasible region of Has,r, namely, 

Upper nm {ACE SS}r ) 


= max 

ir§8,r GTZ r 


i QllrPllr 1 . f n (1 Qoir)Poir 

mm <! 1 , —- \ + mm < 0 , - 1 

Pol r 'Kss,r J [ 01 r 7T ss,r 


Lower nm ( ACE S . 


= mm 

TTss.r €7?. r 


i n (1 Qllr)Pllr \ . f * QoirPoi 

max < 0 , 1 — --- > — mm < 1 


Poir 7Tss,r 


Since 


iin 11 


is increasing in 7 Ts S , r , and 


max < 0,1 — 


Poir 'Kss, 

is decreasing in 7Vs S ,r , the above bounds can be simplified as 


-Poir 7Tss,r 

} 

(1 Qllr)Pllr I . f QoirPoir 1 
— r — mm 1 U 75 - ( 

r 1 1 Poir 7Tss,r 1 


QllrPll 
Poir 7Tss,r 


min <1 1 , | + min |o, v \, - 1 


(1 Qoir)Poir 
Pdr 7Tss,r 


tt / A jm \ • ) a QllrPllr 1 . . f n (1 Qoir)Poir 

Upper nrn (ACE ss , r ) = mm <J 1, -- 5 — } + min <J 0 , —-—--1 

-Milr -t lOr 

QoirPoi 


Poir PlOr 


Lower nm (AC E ss ,r) = max {o, 1 - Pi \ _ m in {1, 

t Poir — PlOr J t Poir — pLOr 

if Poir ^ PlOr, and [ 1, 1] if Loir <C PlOr- 

Similarly, we can obtain the bounds of the other three principal strata. 


Appendix A.2: Bounds with monotonicity 

Let Upper m (ACE vr ) and Lower m ( AC-Eur) denote the upper and lower bounds of ACE ur without 
tonicity, respectively. 

Proposition A. 2. With monotonicity, the bounds of ACE aStr are 

Upperm(ACE ss , r ) = min ( 1 , 9 L1 ^ Pllr \ _ Q 01r? 

l Poir J 

t / A r~i \ ( n QllrPllr Pllr H" Poir 1 ^ 

Lowerm{ACE S s,r) = max 0 ,---> - Q oir; 

L Poir J 

the bounds for ACE S s, r are 


tt t A mm \ • J i QllrPllr 1 f n QoOrPoOr PlOr 

Upper m (ACE sSir ) = mm { 1, --— \ - max { 0, —- - - 

Pllr Pdr 

QoOrP 30r 

Pll — Poir 


Pllr — P(J 

Lower m (AC E aa , r ) = max -I 0 , Qpfl. tdl -—t. _ m in -J 1 


and the bounds for ACE 3a , r are 


Upper m (ACEss, r ) = Qwr — max l 0 


Lowerm{ACEss , r ) = Qio — min < 1 


QoOr-PoOr Pllr T PlOr 


QoOr-PoOr 


25 

. r , the 


□ 

mono- 



Proof of Proposition ^^| When monotonicity holds, we have TTss,r = 0, and we can identify the proportions 
of all the principal strata, and thus we have 

Upper m (ACE ss , r ) = Upper nrn (ACE ss ,r \ Tts S ,r = 0), 

Lower m (AC E ss t r) ~~ Lower nm(ACE s s,r \ 7Tss,r — 0). 

Similarly, we can obtain the bounds of the other two principal strata. □ 


Appendix A.3: Bounds for the ACCTs data 

We compute bounds of the PSACEs for the 10 trials separately. Figure[5|sho ws the large sample bounds and 
confidence intervals for PSACEs based on the bootstrap (Cheng and Small 2006). Figure 5(b) shows that, 


without monotonicity, bounds of the PSACEs are barely informative. These bounds always contain zero, 
and bounds of ACE a s,r and ACEs a ,r are [—1,1], which are too wide for us to get any useful information. 


Figure 5(a) shows that, with monotonicity, bounds are sometimes informative. But the confidence intervals 
for the PSACEs always contain zero, which do not provide strong evidence for the presence of causal effects. 

Standard resampling techniques, including the bootstrap, may be inconsistent and lead to confidence 
sets that are not asymptotically valid in a point-wise or uniform sense for drawing inference on partially 


identified quantities (Andrews 2000 Andrews and Guggenberger 2009 Romano and Shaikh 20101. Here 


we follow Cheng and Small (20061’s procedure to obtain the confidence intervals for the bounds, which is 


easy to implement. For more careful analysis, we can use the methods developed by Chernozhukov ct al. 
(2013). However, we do not do this here because calculating the bounds is not our primary goal. And what 


is more, the bounds are barely informative in our application. 


Appendix B: Proofs of the propositions and theorems 

Appendix B.l: Proof of Proposition 1 in Section 3 

Proposition 1. Under Assumption 1 , Assumption 3 is equivalent to | (U,Z), which implies 

ACE ur = ACE ur i for all u = ss, ss, ss, ss and r ^ r 1 . 

Proof of Proposition^ 7] Under randomization and homogeneity (Assumption 3), we have 

P(Y = 1 | U = u,R = r,Z = z) 

= P{Y(z) = l\U = u,R = r,Z = z} 

= P{Y(z) = 1 | U = u,R = r} = P{Y(z) = 1 | U = u}, 

and 


P(Y = 1 | U = u,Z = z) 

= J2 P ^ Y = l\ U = u,R = r,Z = z)- P(R = r\U ~u,Z = z) 

r 

= J2 (z) = 1 I u = u}- P(R = r\ U = u,Z = z) 

r 

= P{Y(z) = 1 | U = u}-J2 P(R = r\ U = u,Z = z) 

r 

= P{Y(z ) = 1 | U = u} = P(Y = 1 | U = u,R = r,Z = z), 

implying R 11 Y \ ( U, Z ). Thus we have 

P(Y = 1 | U = u,R = r,Z = z) = P(Y =1\U = u,R = r',Z = z) 
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for z = 0,1; u = ss, ss, ss, ss; and r,r' = 1,, Nr. Under homogeneity (Assumption 3), we have E{Y(z) \ 
U = u,R = r} = E{Y(z) \ U = u}, and thus ACE ur = ACE ur i, for z = 0,1; u = ss,ss, ss,ss; and 
r,r' = 1,..., Nr. In addition, under randomization and homogeneity (Assumption 3), we have 

P{Y(z) \ U = u,R = r}= P(Y \ Z = z,U = u,R = r) = P(Y \ Z = z,U = u), 
and thus ACE U = 5\ u — Sg u does not depend on r. □ 


Appendix B.2: Proof of Theorem 1 in Section 3.1 

Theorem 2. Under Assumptions 1 to 4, we have that, for u = ss, ss and ss, 

(a) P(Y = 1 | Z = 1,U = u) for all principal strata u are identifiable if Assumption 4(u) holds; 

(b) P(Y *■ 1 | Z = 0, U = u) for all principal strata u are identifiable if Assumption 4(b) holds; 

(c) ACE U for all principal strata u are identifiable if both Assumptions 4(a) an d 4(b) hold. 

Proof of Theorem [2| First, we can identify the proportions of all the principal strata under monotonicity. 
Let uj ys \ zr = P(Y = y, S = s Z = z, R = r) denote the probability of (Y = y, S = s) given (Z = z,R = r), 
which can be identified from the observed data. 

The subpopulation with (Z = 1, S = 0) is equivalent to stratum ss under treatment, and the subpopu¬ 
lation with (Z = 0, S = 1) is equivalent to stratum ss under control. Homogeneity (Assumption 3) implies 

that Wiopr = <5i,ss ■ itss.r and u!n|o r = So,ss ■ 7r SSjr , and therefore we can identify <5i,sa = uqo|i r /irss,r and 

<5o,aa = Wnior/TTss.r from the observed data directly. 

The subpopulation with (Z = 1, S = 1) is a mixture of two latent strata ss and ss, and under homo¬ 
geneity (Assumption 3) we have: 

Uqi|lr ~ ^l,aa ' rV ss ^ r -f- <fl,sa ' 7T S a,r- 

For the two trials n and r 2 satisfying Assumption 4(a) with 7 r ss , ri 7 r s s , r2 ^ ‘JTss,r 1 / n'ss,r 2 , we have: 

1111 ^l,aa * 7Tss,ri T ^l,aa ' 7Tss,ri : 

^ll\lr 2 — <5l,aa * 7 Tsa,r 2 T ^l,aa ' Wss,r 2 , 

from which we can find a unique solution for (<5i, S a, <h,aa): 

KBS,r 2 ul ll\lr 1 ~ 7r aa,r 1 ^ll|lr 2 
7r ss,r'2 7r ss,r'-^ 7r ss,r2 5 
TTss.r! CJ ll|lr 2 ~ 7r s s ,r 2 UJ ll\lr 1 
s, 7*2 'Ess, 7^2 E s s 

Similarly, the subpopulation with (Z — 0, S' = 0) is a mixture of two latent strata ss and ss, and under 
homogeneity (Assumption 3) we have: 

^10|0r = &Q,ss ‘ 7Tss,r &0,ss " 7T ss,r • 

For the two trials r 3 and r 4 satisfying Assumption 4(b) with 7 rss,r 3 7 r s s )r4 7 ^ 7r s s )r 3 7rss,r 4 , we have: 

{ k-h 0 | 0 r 3 ~~ &0,ss * 7T ss,r 3 “h &0,ss ’ 

^ 10 | 0 r 4 = ^ 0 ,ss ' 7Tss )r4 $ 0 ,ss ' 7Tss,r4) 

from which we can obtain a unique solution to (tfo,ss, £o,ss): 

E S s,r 4 ^lQ|Qrg ~,r 3 ^1Q|Qr 4 
Ess ,r 3 7r ss,r’4 E ss ,r^ E^s, r 4 ’ 

7rss,r 3 ^10|Qr 4 —^55^4^1010^ 

Ess ,r 3 E s s, 7-4 E s a ,r$Ess 




Since ACE U = <5i„ — 5q u , we can identify ACE u ’s under Assumptions 1 to 4. 


□ 






Table B.3.2. True values of P(Z = 1 | R = r) 

and P(U = u\ R = r) 



r = 1 

r = 2 

r = 3 

P(Z = 1 I R = r) 

0.4 

0.5 

0.6 

7fss,r 

0.6 

0.1 

0.1 

7fss,r 

0.2 

0.6 

0.1 

ss,r 

0.1 

0.2 

0.6 

7Tss,r 

0.1 

0.1 

0.2 


Appendix B.3: Proof of Proposition 2 in Section 3.2 

Proposition 2 . Under Assumptions 1 and 3, a necessary condition for the local identifiability of the 
joint distribution of (Y, Z, S, U, R) is Nr > 3 . 

Proof of Proposition The observed data of (R , Z, S, Y) form an Nr x 2 x 2 x 2 contingency table 
with (8 Nr — 1) free frequencies, therefore the saturated model contains (8Nr — 1) parameters. The joint 
distributions contains (5 Nr + 7) parameters, with P(R) contributing (Nr — 1) parameters, P(Z \ R) 
contributing Nr parameters, P(U \ R) contributing 3 Nr parameters, and P(Y \ Z, U ) contributing 8 
parameters. Therefore, the model with monotonicity has (8 Nr — 1) — (5 Nr + 7) = (3 Nr — 8) degrees of 
freedom, which is positive when Nr > 3. 


Example B.3.5. Suppose we have three trials. We set true parameters as 


(^l,ss, ^0,ss, <5o,ss, <5l,ss, <5(3,ss, <5l,ss, ^0,ss) 


(0.8,0.5, 0.7, 0.3, 0.6, 0.1, 0.5,0.2), 


and display other parameters in Table B.3.1. 

We can then get the observed distribution of P(Z = z, S = s,Y = y | R = r) as shown in Table B.3.2. 
By the homogeneity assumption, the probabilities of the observed data can be decomposed as 


P(Z = z,S = s,Y = y\ R = r) 

Y P(Z = z \ R = r) ■ 7r ur • P(Y = y \ Z = z,U = u), (B.l) 

uEO(z,s) 


for z,s,y = 0,1. Similar to the proof of Proposition []| we can obtain that the number of parameters on the 
right-hand side of Equation \B.l]j is 20. The observed distribution of (Z,S,Y \ R) form a contingency table 
with 21 free frequencies on the left-hand side of Equation (B.l ). We can use R package neqslv to solve these 
equations. We obtain that 


(^1,33i 3o,ss, ^0,ss: ^l,ss, ^0,ssi ^0,ss) (0.8,0.5, 0.7, 0.3, 0.6, 0.1, 0.5, 0.2), 


which are the same as the true parameters. 


Appendix B.4: Proof of Proposition 3 in Section 4 

Proposition 3 . For trial r, assume that the principal surrogate satisfies the causal necessity. 

(i) With monotonicity, we have ACEf = ACE ® x ACE a s, r . 

(ii) Without monotonicity, suppose ACE ® > 0, we have that, if ACE aa , r + ACE as , r > 0, 

ACEji x ACE aa:r < ACEj < (ACE aa , r + ACE aa , r )/ 2 + ACE(! x ( ACE sa:r - ACE Sa , r )/ 2, 
and otherwise 

(ACE sS , r + ACE aa:r )/ 2 + ACE(! x ( ACE aa , r - ACE- aa , r )/2 < ACE J < ACE(! x ACE aa , r . 
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Table B.3.3. Observed distribution of P(Z = z, S = s,Y = y \ R = r) 




R 

Z = 1 

= 1 

z = 0 

R 

z = 1 

= 2 

Z = 0 

R 

Z = 1 

0 

II 

CO tS] 

II 

s = 

1,Y = 1 

0.248 

0.192 

0.250 

0.035 

0.090 

0.036 

s = 

1,Y = 0 

0.072 

0.228 

0.100 

0.065 

0.030 

0.084 

s = 

0, Y = 1 

0.044 

0.042 

0.085 

0.100 

0.276 

0.036 

s = 

O 

II 

O 

0.036 

0.138 

0.065 

0.300 

0.204 

0.244 


Proof: With monotonicity, from the causal necessity ACE as ,r = ACEss,r = 0, we have 

ACEj = E{Y{ 1) - Y (0) \R = r} = E{Y( 1) - Y(0) \U = ss,R = r}TT S s,r = ACE sa ,r-n sa , r . 
Monotonicity also implies that ACE/ = E{S{ 1) — S'(O) | R = r} = 7r S s jT ., and therefore ACEj = ACEf x 

AC E S s,r- 

Without monotonicity, from the causal necessity, we have 

ACEr = ^2 E{Y( 1) — Y(0) | U = u, R = r}ir ur = ACEss,rTr s s,r + ACEs a , r Trss,r- 

U 

Since ir a s,r — TTsa,r = E{S(1) — S(0) \ R = r} = ACEf, the above equation can be rearranged as 

ACEr = (ACEsa,r + ACE SB , r )ir Sa ,r + ACEr X ACEsa,r. 

From the linear constraints 0 < ns S ,r, TTsa,r < l,7rs S , r + Tr a a, r < 1 and n 3 s, r — Tra S ,r = ACE^, we obtain 
TTss,r £ [max(0, —ACEr), (1 — ACEr)/2] = [0, (1 — ACE^)/ 2]. If ACE S s, r + ACE as ,r > 0, we have 

ACEr X AC Ess,r < ACEr 

< {AC Ess, r + AC Ess, r) x (1 - ACE^)/ 2 + ACE^ x ACE S s, r 

— {ACE s s,r ACEss,r)/2 ACEr X {ACE S s,r — ACEsa,r)/ 2. 

Otherwise, we have 

{ACEaa,r + ACE Ss , r )/2 + ACEs x {AC Ess,r - AC Ess,r)/2 < ACE? < ACE ® x ACE a s,r- 


Appendix C: EM algorithms and the Gibbs Samplers 

Let Ntotal denote the total sample size. Let n zuyr be the frequencies of the complete data, i.e., the 
frequencies of patients with (Z = z,U = u,Y = y, R = r ), for 2 = 0,1; r = 1 ,...,Nr\ y = 0,1; and 
u = ss, ss,ss,ss. Let N zsyr be the frequencies of the observed data, i.e., the frequencies of patients with 
{Z = z, S = s,Y = y, R = r), for 2 = 0,1; r = 1,..., Nr- y = 0,1; and s = 0,1. 


Appendix C.l: EM algorithm with monotonicity 

We can write the complete-data likelihood as 

n r , 

L c {£) a (pr ++ * T ■ av 1++T '(l - a r ) no++ '' • n tt2+“ + 

r= 1 ' u=ss ,ss,ss 

Ntotal 

■ n S“ i+ (t - 5zuT zu ° + ■ n ^ Ui e (c- 1 ) 



where /{•} is the indicator function. Let £*■ k be the estimate of £ after the fc-th iteration. 

In the E-step, we need to calculate = E{n zuyr \ N zsyr , £*■+ given the observed data and ^ k \ 

r/e+1) ffc+1) 

Under monotonicity, it is relatively straightforward to obtain that + sa y r = Noi yr and n\ aayr = Nio yr . 
For other frequencies, we have 
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In the M-step, we update the parameters as follows: 
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Appendix C.2: EM algorithm without monotonicity 

We can write the complete-data likelihood as 


Nr , 

l c (€) oc n [p 


n +++r . a n 1++r( i_ Q ^ ro++r . Yl TT™r 
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(C.2) 


In the E-step, we need to calculate nYYyV = E(n zuyr \ N zsyr , given the observed data and as 
follows: 
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In the M-step, we can update the parameters as follows: 
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Appendix C.3: Gibbs Sampler with monotonicity 

We treat U as the missing data, and the Gibbs sampler iterates between the following imputation and 
posterior steps. In the imputation step, we draw U given the observed data and all the parameters. When 
( Zi = 1, Si — 0), we impute Ui = ss; when (Zi = 0, Si = 1), we impute Ui = ss; when (Zi = 1, Si = 1), we 
impute Ui as: 


P(Ui = 88\Zi = l,Si = l,Ri,Yi,£) 
P(Ui = 88\Zi**l,Si = l,Ri,Yi,S) 


<5^W(1 — Yi TTss,Ri 

'l2u=ss,ss ^1 ,«(■*■ ~ $l,uy-Un u , Ri 

1 - P(Ui = ss \ Zi = l,Si~ 1, Ri, Yi , 0; 


when (Zi = 0, Si = 0), we impute Ui as: 


P(Ui = ss \ Zi = 0,Si = 0,Ri,Yi,£) 
P(Ui = ss \ Zi = 0,Si = 0 ,Ri,Yi,£) 


^o,ss(l ^o,ss) z rr S s,Ri 

22u=sS,ss — <^0 ,u) 1 ~ Y, TTu,R i 

1 - P(Ui = ss | Zi = 0, Si = 0, Ri, Yi, £). 


After imputing U, we can compute the frequencies of the complete data. 

In the posterior step, we draw all the parameters given the complete data. Specifically, we draw the 
parameters from the following conditional distributions: 


(pi,P2,--- ,Pn r ) | • 
a r | • 

(rrss,r , 7T ss,r , riss^r) | ' 

Szu I • 


Dirichlet(n +++ i + 1, • • • , n +++NR + 1), 
Beta(m ++r + 1, n 0 ++ r + 1), Vr, 

Dirichlet(n+ ss + r d - 1, ri-i-ss-\-r d - 1, R-\-ss-\-r d - 1), 
Beta(n 2 „i+ d- 1, n zu o+ + 1), Vw, z. 


Appendix C.4: Gibbs Sampler without monotonicity 

We treat U as the missing data, and the Gibbs sampler iterates between the following imputation and 
posterior steps. In the imputation step, we draw U given the observed data and all the parameters. When 
(Zi = 1, Si = 1), we impute Ui as: 


P(Ui 


P(Ui 


ss | Zi 
ss | Zi 
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when (Zi = 0, Si = 0), we impute Ui as: 
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when (Zi = 1, Si = 0), we impute Ui as: 


P(Ui = 88\Zi = l,Si=0,Ri,Yi,£) 
P(Un&88\Zi = l,Si=0,Ri,Yi,£) 


<5^,(1 — ^l.as) 1 Yi 1T3a,R t 
1 - P(Ui = ss \ Zi = l,Si = 0, Ri, Yi, 0; 


when (Zi = 0, Si = 1), we impute Ui as: 


P(U i = ss\Z i =0,Si = l,Ri,Y i ,£) 
P(Ui = ss\Zi=0,S i = l,R,,Y l ^) 
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After imputing U, we can compute the frequencies of the complete data. 

In the posterior step, we draw all the parameters given the complete data. Specifically, we draw the 
parameters from the following conditional distributions: 


(Pi,P2,- ■ ■ ,Pn r ) | • 
Oir | • 

(if ss ,r i ss,r i 7tss,r i T^ss jr*) | * 

Szn | • 
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Appendix C.5: Gibbs Sampler for the Bayesian hierarchical model in Section 7 

Define 7 ] zv , r = logit( 5 Z ur ), tl = {rj zur : z = 0,1; m = ss, ss, ss, ss}, and 0 = (£, 77 ). The complete data 
likelihood is 
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We treat U as the missing data, and the Gibbs sampler iterates between the following imputation and 
posterior steps. In the imputation step, we draw U given the observed data and all the parameters. When 
(Zi = 1 , Si = 1), we impute Ui as: 


P(Ui = 88\Zi = l,Si = l,Ri,Yi,£) 
P(Ui = 8s\Zi = l,Si = l,Ri,Yi,£) 


^l'ss,Ri (■*■ <5l,ss,«i) ! 7Tss,«i 
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when (Zi = 0, Si = 0), we impute Ui as: 


P(Ui&88\Zi=0,Si=0,Ri,Yi,£) 
P(Ui = ss | Zi = 0, Si = 0,Ri,Yi,£) 
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1 - P(Ui = ss I Zi = 0, Si = 0, Ri, Yi, £); 
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when (Zi = 1 ,Si — 0), we impute Ui as: 


P(Ui = ss | Zi = l. Si = 0 ,Ri,Yi,£) 
P{Ui = ss\Zi = l,S i =0,Ri,Y i ,£) 
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when (Zi = 0, S% = 1), we impute Ui as: 
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After imputing U, we can compute the frequencies of the complete data. 

In the posterior step, we draw all the parameters given the complete data. Specifically, we draw the 
parameters from the following conditional distributions: 
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The first three conditional distributions are standard. The fourth one is a truncated Normal distribution, 
which can be generated by applying the inverse of its cumulative distribution function to a Uniform(0,1) 
random variable. The posterior distribution of rj zur is not standard, but fortunately its conditional density 
is log concave since 

d 2 log {p(r] ZU r | •)} _ __L_ n zu + r e~ rlzur 

dpl ur a 2 (1 + e~ r >zu r )2 


Therefore its posterior distribution is unimodal, and we can use the Metropolized Independence Sampler 
(Liu 2001) to sample ri ZU r using a Normal proposal. We choose the proposal distribution as r) ~ A r (r 7 , V), 
where rj is the mode of the posterior distribution and ip is the inverse of the Fisher information at the mode. 
Due to the log concave density, we can simply use the Newton-Raphson iteration to find the mode rj. 


Appendix D: More details about the simulation studies 


For the correctly specified model, Figures [6(a) | and |6(b)| show the results for ACE a s, and Figures [7(a) | and 
7(b)| show the results for ACEss, with the biases and RMSEs of the MLEs and the coverage proportions 


Figures 
as, and Figures 


6 (c) 


8 (a) 


and 

and 


6 (d) 


8 (b) 


show the results for 
show the results for 


of the posterior credible intervals. For the misspecified models 
ACE a s, Figures 7(c) and 7(d) show the results for j 
ACEss, with the biases and RMSEs of the MLEs and the coverage proportions of the posterior credible 
intervals. The reason why the RMSEs with more trials may be larger for the same strata is that certain 
settings of parameters may make the sample sizes smaller for the strata with more trials. 
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Fig. 5. Bounds for the PSACEs with and without monotonicity. The solid lines are the bounds and the dotted 
lines are the confidence intervals for the bounds. 















Fig. 6. Simulations for homogeneity (a) and (b) and heterogeneity (c) and (d). Each subgraph presents the 
bias, RMSE, coverage proportions of the 95% credible intervals of ACE S s. Nine combinations of N R and N 
are shown in (a) and (b); and nine combinations of N and d (“1” for .01, “2” for .025, “3” for .05) are shown in 
(c) and (d). 
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(a) Correctly specified model with monotonicity 
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(d) Misspecified model without monotonicity 


Fig. 7. Simulations for homogeneity (a) and (b) and heterogeneity (c) and (d). Each subgraph presents the 
bias, RMSE, coverage proportions of the 95% credible intervals of ACEss■ Nine combinations of N R and N 
are shown in (a) and (b); and nine combinations of N and cl (“1” for .01, “2” for .025, “3” for .05) are shown in 
(c) and (d). 
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Bias RMSE Coverage 





(a) Correctly specified model without monotonicity 
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(b) Misspecified model without monotonicity 
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Fig. 8. Simulations for homogeneity (a) and (b) and heterogeneity (c) and (d). Each subgraph presents the 
bias, RMSE, coverage proportions of the 95% credible intervals of ACEs S . Nine combinations of N R and N 
are shown in (a) and (b); and nine combinations of N and d (“1” for .01, “2” for .025, “3” for .05) are shown in 
(c) and (d). 
























